首页> 外文OA文献 >Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications
【2h】

Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications

机译:适用于振动声学应用的多GPU和多CPU加速FDTD方案

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of the bi-dimensional scheme of the FDTD method using multi-CPU and multi-GPU, respectively. In the first implementation, an open source message passing interface (OMPI) has been included in order to massively exploit the resources of a biprocessor station with two Intel Xeon processors. Moreover, regarding CPU code version, the streaming SIMD extensions (SSE) and also the advanced vectorial extensions (AVX) have been included with shared memory approaches that take advantage of the multi-core platforms. On the other hand, the second implementation called the multi-GPU code version is based on Peer-to-Peer communications available in CUDA on two GPUs (NVIDIA GTX 670). Subsequently, this paper presents an accurate analysis of the influence of the different code versions including shared memory approaches, vector instructions and multi-processors (both CPU and GPU) and compares them in order to delimit the degree of improvement of using distributed solutions based on multi-CPU and multi-GPU. The performance of both approaches was analysed and it has been demonstrated that the addition of shared memory schemes to CPU computing improves substantially the performance of vector instructions enlarging the simulation sizes that use efficiently the cache memory of CPUs. In this case GPU computing is slightly twice times faster than the fine tuned CPU version in both cases one and two nodes. However, for massively computations explicit vector instructions do not worth it since the memory bandwidth is the limiting factor and the performance tends to be the same than the sequential version with auto-vectorisation and also shared memory approach. In this scenario GPU computing is the best option since it provides a homogeneous behaviour. More specifically, the speedup of GPU computing achieves an upper limit of 12 for both one and two GPUs, whereas the performance reaches peak values of 80 GFlops and 146 GFlops for the performance for one GPU and two GPUs respectively. Finally, the method is applied to an earth crust profile in order to demonstrate the potential of our approach and the necessity of applying acceleration strategies in these type of applications.
机译:有限差分时域(FDTD)方法用于分析振动声波问题,并研究纵向波和横向波在分层介质中的传播。这项工作证明了该方案的潜力以及每种加速策略在FDTD中进行大量计算的相关性。在本文中,我们提出了分别使用多CPU和多GPU的FDTD方法的二维方案的两个新的具体实现。在第一个实现中,已包括一个开源消息传递接口(OMPI),以便大量利用带有两个Intel Xeon处理器的双处理器站的资源。此外,关于CPU代码版本,流式SIMD扩展(SSE)以及高级矢量扩展(AVX)已包含在利用多核平台的共享内存方法中。另一方面,称为多GPU代码版本的第二种实现是基于两个GPU(NVIDIA GTX 670)上CUDA中可用的对等通信。随后,本文对包括共享内存方法,向量指令和多处理器(CPU和GPU)在内的不同代码版本的影响进行了精确分析,并进行了比较,以界定基于分布式解决方案的改进程度。多CPU和多GPU。分析了这两种方法的性能,并证明了将共享内存方案添加到CPU计算中可以显着改善矢量指令的性能,从而扩大了有效使用CPU高速缓存的仿真大小。在这种情况下,在一个和两个节点上,GPU计算都比微调CPU版本快两倍。但是,对于大量计算,显式矢量指令不值得,因为内存带宽是限制因素,并且性能往往与具有自动矢量化和共享内存方法的顺序版本相同。在这种情况下,GPU计算是最佳选择,因为它提供了同质的行为。更具体地,GPU计算的速度对于一个和两个GPU都达到了12的上限,而对于一个GPU和两个GPU,性能分别达到了80GFlop和146GFlop的峰值。最后,将该方法应用于地壳剖面,以证明我们的方法的潜力以及在这类应用中应用加速策略的必要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号